Journal of Medical Internet Research — Latest Matching Preprints

1

Dynamic Topic Alignment and Sentiment between Official Health Communication and General Public Discourse during COVID-19: A Comprehensive Infoveillance Framework

Yin, S.; Xin, W.; Chen, S.; Ge, Y.

2026-05-27 public and global health 10.64898/2026.05.23.26353966 medRxiv

Top 0.1%

80.4%

Show abstract

Social media has become a critical channel for public health communication during the COVID-19 pandemic, yet how official health messaging aligns with broader public discourse remains insufficiently understood. This study develops an end-to-end info-veillance framework to examine the dynamic relationship between Centers for Disease Control and Prevention (CDC) communications and general public discourse on social media. We analyzed 17,524 CDC tweets and 67,895 public discourse tweets. Biterm Topic Model (BTM) was used to extract topics from each corpus, and a novel topic consistency scoring system integrating cosine similarity with daily public topic prominence was developed to quantify temporal alignment between official health communication and public discourse. Two complementary sentiment measures were incorporated: expected sentiment (average emotional tone) and net sentiment (overall emotional intensity). Temporal relationships were examined using autoregressive integrated moving average with exogenous variables (ARIMAX) models. Results show that topic alignment increased over time across CDC topics, while expected sentiment remained consistently negative. Higher alignment was associated with immediate and delayed changes in expected sentiment and stronger emotional intensity in net sentiment based on ARIMAX results. These findings suggest that topic alignment reflects public attention rather than agreement with official communications, and is associated with more negative emotional responses. This framework provides a scalable, generalizable approach to investigate and evaluate public engagement with official health communication.

2

Trend and co-occurrence network study of symptoms through social media: an example of COVID-19

Wu, J.; Wang, L.; Hua, Y.; Li, M.; Zhou, L.; Bates, D. W.; Yang, J.

2022-09-29 public and global health 10.1101/2022.09.28.22280462 medRxiv

Top 0.1%

72.4%

Show abstract

ImportanceCOVID-19 is a multi-organ disease with broad-spectrum manifestations. Clinical data-driven research can be difficult because many patients do not receive prompt diagnoses, treatment, and follow-up studies. Social medias accessibility, promptness, and rich information provide an opportunity for large-scale and long-term analyses, enabling a comprehensive symptom investigation to complement clinical studies. ObjectivePresent an efficient workflow to identify and study the characteristics and co-occurrences of COVID-19 symptoms using social media. Design, Setting, and ParticipantsThis retrospective cohort study analyzed 471,553,966 COVID-19-related tweets from February 1, 2020, to April 30, 2022. A comprehensive lexicon of symptoms was used to filter tweets through rule-based methods. 948,478 tweets with self-reported symptoms from 689,551 Twitter users were identified for analysis. Main Outcomes and MeasuresThe overall trends of COVID-19 symptoms reported on Twitter were analyzed (separately by the Delta strain and the Omicron strain) using weekly new numbers, overall frequency, and temporal distribution of reported symptoms. A co-occurrence network was developed to investigate relationships between symptoms and affected organ systems. ResultsThe weekly quantity of self-reported symptoms has a high consistency (0.8528, P<0.0001) and one-week leading trend (0. 8802, P<0.0001) with new infections in four countries. We grouped 201 common symptoms (mentioned [≥] 10 times) into 10 affected systems. The frequency of symptoms showed dynamic changes as the pandemic progressed, from typical respiratory symptoms in the early stage to more musculoskeletal and nervous symptoms at later stages. When comparing symptoms reported during the Delta strain versus the Omicron variant, significant changes were observed, with dropped odd ratios of coma (95%CI 0.55-0.49, P<0.01) and anosmia (95%CI, 0.6-0.56), and more pain in the throat (95%CI, 1.86-1.96) and concentration problems (95%CI, 1.58-1.70). The co-occurrence network characterizes relationships among symptoms and affected systems, both intra-systemic, such as cough and sneezing (respiratory), and inter-systemic, such as alopecia (integumentary) and impotence (reproductive). Conclusions and RelevanceWe found dynamic COVID-19 symptom evolution through self-reporting on social media and identified 201 symptoms from 10 affected systems. This demonstrates that social medias prevalence trends and co-occurrence networks can efficiently identify and study public health problems, such as common symptoms during pandemics. Key pointsO_ST_ABSQuestionsC_ST_ABSWhat are the epidemic characteristics and relationships of COVID-19 symptoms that have been extensively reported on social media? FindingsThis retrospective cohort study of 948,478 related tweets (February 2020 to April 2022) from 689,551 users identified 201 self-reported COVID-19 symptoms from 10 affected systems, mitigating the potential missing information in hospital-based epidemiologic studies due to many patients not being timely diagnosed and treated. Coma, anosmia, taste sense altered, and dyspnea were less common in participants infected during Omicron prevalence than in Delta. Symptoms that affect the same system have high co-occurrence. Frequent co-occurrences occurred between symptoms and systems corresponding to specific disease progressions, such as palpitations and dyspnea, alopecia and impotence. MeaningTrend and network analysis in social media can mine dynamic epidemic characteristics and relationships between symptoms in emergent pandemics.

3

Infoveillance study on the dynamic associations between CDC social media contents and epidemic measures during COVID-19

Yin, S.; Chen, S.; Ge, Y.

2023-06-27 health informatics 10.1101/2023.06.26.23291921 medRxiv

Top 0.1%

66.4%

Show abstract

BackgroundHealth agencies have been widely adopting social media to disseminate important information, educate the public on emerging health issues, and understand public opinions. The Centers for Disease Control and Prevention (CDC) has been one of the leading agencies that utilizes social media platforms during the COVID-19 pandemic to communicate with the public and mitigate the disease in the United States. It is crucial to understand the relationships between CDCs social media communication and the actual epidemic metrics to improve public health agencies communication strategies during health emergencies. ObjectiveThe aim of this study was to identify key topics in tweets posted by CDC during the pandemic, to investigate the temporal dynamics between these key topics and the actual COVID-19 epidemic measures, and to make recommendations for CDCs digital health communication strategies for future health emergencies. MethodsTwo types of data were collected: 1) a total of 17,524 COVID-19-related English tweets posted by the CDC between December 7, 2019 and January 15, 2022; 2) COVID-19 epidemic measures in the U.S. from the public GitHub repository of Johns Hopkins University from January 2020 to July 2022. Latent Dirichlet allocation (LDA) topic modeling was applied to identify key topics from all COVID-19-related tweets posted by CDC, and the final topics were determined by domain experts. Various multivariate time series analysis techniques were applied between each of the identified key topics and actual COVID-19 epidemic measures to quantify the dynamic associations between these two types of time series data. ResultsFour major topics from CDCs COVID-19 tweets were identified: 1) information on prevention of health outcomes of COVID-19; 2) pediatric intervention and family safety; 3) updates of the epidemic situation of COVID-19; 4) research and community engagement to curb COVID-19. Multivariate analyses showed that there were significant variabilities of progression between CDCs topics and the actual COVID-19 epidemic measures. Some CDCs topics showed substantial associations with the COVID-19 measures over different time spans throughout the pandemic, expressing similar temporal dynamics between these two types of time series data. ConclusionsOur study is the first to comprehensively investigate the dynamic associations between topics discussed by CDC on Twitter and the COVID-19 epidemic measures in the U.S. We identified four major topic themes via topic modeling and explored how each of these topics was associated with each major epidemic measure by performing various multivariate time series analyses. We recommend that it is critical for public health agencies, such as CDC, to disseminate and update timely and accurate information to the public and align major topics with the key epidemic measures over time. We suggest that social media can help public health agencies to inform the public on health emergencies and to mitigate them effectively.

4

Impact of a Social Media Derived Digital Self Management Platform on Population Level Irritable Bowel Syndrome Emergency Utilization: A Controlled Interrupted Time Series Analysis Using South Korean National Health Insurance Data

Park, J.-H.; Lim, A.

2026-03-23 health informatics 10.64898/2026.03.20.26348871 medRxiv

Top 0.1%

62.6%

Show abstract

BackgroundIrritable bowel syndrome (IBS) contributes disproportionately to gastrointestinal-related emergency department (ED) utilization in South Korea, yet evidence on population-level interventions informed by patient-generated digital discourse remains limited. Recent social media analyses have identified dominant thematic concerns among IBS patients, including dietary triggers, symptom management, psychosocial burden, and information-seeking, suggesting actionable targets for digital self-management tools. ObjectiveTo evaluate the population-level impact of the Jang Geongang (, "Gut Health") digital self-management platform, whose content architecture was informed by topic modeling of IBS-related social media discourse, on IBS-attributed ED visits and unplanned hospitalizations, using a controlled interrupted time series (CITS) design. MethodsWe analyzed monthly aggregate claims data from South Koreas National Health Insurance Service (NHIS) spanning January 2018 to December 2024 (84 monthly observations). The Jang Geongang platform was launched in four pilot metropolitan areas (Seoul, Incheon, Daejeon, Gwangju) in July 2021, with eight non-pilot metropolitan areas serving as concurrent controls. Segmented regression with Newey-West heteroskedasticity and autocorrelation consistent (HAC) standard errors was used to estimate changes in level and trend of IBS-attributed ED visits per 100,000 insured population. Sensitivity analyses included autoregressive integrated moving average (ARIMA) transfer function models, varying pre-intervention windows, and leave-one-out control exclusion. ResultsThe CITS model estimated an immediate level change of -3.42 IBS-attributed ED visits per 100,000 (95% CI: -5.18 to -1.66, p < 0.001) following platform launch, and a change in monthly trend of -0.19 visits per 100,000 per month (95% CI: -0.31 to -0.07, p = 0.003), compared to control areas. By December 2024, the cumulative estimated reduction was 10.5 ED visits per 100,000 (23.8% relative reduction). Effects were concentrated in younger adults (19-39 years; level change: -5.14, p < 0.001) and IBS-D subtype visits (level change: -4.87, p < 0.001). ARIMA transfer function models corroborated these findings (immediate impact: -3.28, p = 0.001). Unplanned hospitalizations showed a smaller but significant reduction (level change: -0.84 per 100,000, p = 0.018). ConclusionsA digital self-management platform designed using social media derived IBS patient discourse insights was associated with sustained population-level reductions in IBS-attributed emergency utilization. Controlled interrupted time series analysis provides robust evidence for the public health impact of translating social media analytics into scalable digital health interventions.

5

Twitter activity about treatments during the COVID-19 pandemic: case studies of remdesivir, hydroxychloroquine, and convalescent plasma.

Hamamsy, T. C.; Bonneau, R.

2020-07-13 public and global health 10.1101/2020.06.18.20134668 medRxiv

Top 0.1%

62.4%

Show abstract

Since the COVID-19 pandemic started, the public has been eager for news about promising treatments, and social media has played a large role in information dissemination. In this paper, our objectives are to characterize the public discussion of treatments on Twitter, and demonstrate the utility of these discussions for public health surveillance. We pulled tweets related to three promising COVID-19 treatments (hydroxychloroquine, remdesivir and convalescent plasma), between the dates of February 28th and May 22nd using the Twitter public API. We characterize treatment tweet trends over this time period. Most major tweet/retweet/sentiment trends correlated to public announcement made by the white house and/or to new clinical trial evidence about treatments. Most of the websites people shared in treatment-related tweets were non-scientific media sources that leaned conservative. Hydroxychloroquine was the most discussed treatment on Twitter, and over 10% of hydroxychloroquine tweets mentioned an adverse drug reaction. There is a gap between the public attention/discussion around COVID-19 treatments and their evidence. Twitter data can and should be used for public health surveillance during this pandemic, as it is informative for monitoring adverse drug reactions, especially as many people avoid going to hospitals/doctors.

6

Unmasking the conversation on masks: Natural language processing for topical sentiment analysis of COVID-19 Twitter discourse

Sanders, A.; White, R.; Severson, L.; Ma, R.; McQueen, R.; Alcanatara Paulo, H. C.; Zhang, Y.; Erickson, J. S.; Bennett, K. P.

2020-09-01 health informatics 10.1101/2020.08.28.20183863 medRxiv

Top 0.1%

61.3%

Show abstract

In this exploratory study, we scrutinize a database of over one million tweets collected from March to July 2020 to illustrate public attitudes towards mask usage during the COVID-19 pandemic. We employ natural language processing, clustering and sentiment analysis techniques to organize tweets relating to mask-wearing into high-level themes, then relay narratives for each theme using automatic text summarization. In recent months, a body of literature has highlighted the robustness of trends in online activity as proxies for the sociological impact of COVID-19. We find that topic clustering based on mask-related Twitter data offers revealing insights into societal perceptions of COVID-19 and techniques for its prevention. We observe that the volume and polarity of mask-related tweets has greatly increased. Importantly, the analysis pipeline presented may be leveraged by the health community for qualitative assessment of public response to health intervention techniques in real time.

7

Mining Twitter to Assess the Determinants of Health Behavior towards Palliative Care in the United States

Zhao, Y.; Zhang, H.; Huo, J.; Guo, Y.; Wu, Y.; Bian, J.

2020-03-30 health informatics 10.1101/2020.03.26.20038372 medRxiv

Top 0.1%

59.1%

Show abstract

Palliative care is a specialized service with proven efficacy in improving patients quality-of-life. Nevertheless, lack of awareness and misunderstanding limits its adoption. Research is urgently needed to understand the determinants (e.g., knowledge) related to its adoption. Traditionally, these determinants are measured with questionnaires. In this study, we explored Twitter to reveal these determinants guided by the Integrated Behavioral Model. A secondary goal is to assess the feasibility of extracting user demographics from Twitter data--a significant shortcoming in existing studies that limits our ability to explore more fine-grained research questions (e.g., gender difference). Thus, we collected, preprocessed, and geocoded palliative care-related tweets from 2013 to 2019 and then built classifiers to:1) categorize tweets into promotional vs. consumer discussions, and 2) extract user gender. Using topic modeling, we explored whether the topics learned from tweets are comparable to responses of palliative care-related questions in the Health Information National Trends Survey.

8

Selective tweeting of COVID-19 articles: Does title or abstract positivity influence dissemination?

Fabiano, N.; Hallgrimson, Z.; Wong, S.; Salameh, J.-P.; Kazi, S.; Unni, R. R.; Treanor, L.; Frank, R.; Prager, R.; McInnes, M. D.

2021-06-24 health informatics 10.1101/2021.06.22.21259354 medRxiv

Top 0.1%

55.1%

Show abstract

BackgroundPrevious research has shown that articles may be cited more frequently on the basis of title or abstract positivity. Whether a similar selective sharing practice exists on Twitter is not well understood. The objective of this study was to assess if COVID-19 articles with positive titles or abstracts were tweeted more frequently than those with non-positive titles or abstracts. MethodsCOVID-19 related articles published between January 1st and April 14th, 2020 were extracted from the LitCovid database and all articles were screened for eligibility. Titles and abstracts were classified using a list of positive and negative words from a previous study. A negative binomial regression analysis controlling for confounding variables (2018 impact factor, open access status, continent of the corresponding author, and topic) was performed to obtain regression coefficients, with the p values obtained by likelihood ratio testing. ResultsA total of 3752 COVID-19 articles were included. Of the included studies, 44 titles and 112 abstracts were positive; 1 title and 7 abstracts were negative; and 3707 titles and 627 abstracts were neutral. Articles with positive titles had a lower tweet rate relative to articles with non-positive titles, with a regression coefficient of -1.10 (P < .001), while the positivity of the abstract did not impact tweet rate (P = .2218). ConclusionCOVID-19 articles with non-positive titles are preferentially tweeted, while abstract positivity does not influence tweet rate.

9

Global spatiotemporal trends and determinants of COVID-19 vaccine acceptance on Twitter: a multilingual deep learning study in 135 countries and territories

Zhou, X.; Zhang, X.; Larson, H. J.; de Figueiredo, A.; Jit, M.; Fodeh, S.; Vermund, S. H.; Zang, S.; Lin, L.; Hou, Z.

2022-11-17 health informatics 10.1101/2022.11.14.22282300 medRxiv

Top 0.1%

51.1%

Show abstract

BackgroundCOVID-19 vaccination has faced a range of challenges from supply-side barriers such as insufficient vaccine supply and negative information environment and demand-side barriers centring on public acceptance and confidence in vaccines. This study assessed global spatiotemporal trends in demand- and supply-side barriers to vaccine uptake using COVID-19-related social media data and explored the country-level determinants of vaccine acceptance. MethodsWe accessed a total of 13,093,406 tweets sent between November 2020 and March 2022 about the COVID-19 vaccine in 90 languages from 135 countries using Meltwater(R) (a social listening platform). Based on 8,125 manually-annotated tweets, we fine-tuned multilingual deep learning models to automatically annotate all 13,093,406 tweets. We present spatial and temporal trends in four key spheres: (1) COVID-19 vaccine acceptance; (2) confidence in COVID-19 vaccines; (3) the online information environment regarding the COVID-19 vaccine; and (4) perceived supply-side barriers to COVID-19 vaccination. Using univariate and multilevel regressions, we evaluated the association between COVID-19 vaccine acceptance on Twitter(R) and (1) country-level characteristics regarding governance, pandemic preparedness, trust, culture, social development, and population demographics; (2) country-level COVID-19 vaccine coverage; and (3) Google(R) search trends on adverse vaccine events. FindingsCOVID-19 vaccine acceptance was high among Twitter(R) users in Southeast Asian, Eastern Mediterranean, and Western Pacific countries, including India, Indonesia, and Pakistan. In contrast, acceptance was relatively low in high-income nations like South Korea, Japan, and the Netherlands. Spatial variations were correlated with country-level governance, pandemic preparedness, public trust, culture, social development, and demographic determinants. At the country level, vaccine acceptance sentiments expressed on Twitter(R) predicted higher vaccine coverage. We noted the declining trend of COVID-19 vaccine acceptance among global Twitter(R) users since March 2021, which was associated with increased searches for adverse vaccine events. Interpretation In future pandemics, new vaccines may face the potential low-level and declining trend in acceptance, like COVID-19 vaccines, and early responses are needed. Social media mining represents a promising surveillance approach to monitor vaccine acceptance and can be validated against real-world vaccine uptake data. FundingNational Natural Science Foundation of China.

10

The COVID-19 Infodemic: The complex task of elevating signal and eliminating noise.

DESAI, T.; Conjeevaram, A.

2021-01-20 medical education 10.1101/2021.01.19.21249936 medRxiv

Top 0.1%

47.3%

Show abstract

In Situation Report #3 and 39 days before declaring COVID-19 a pandemic, the WHO declared a -19 infodemic. The volume of coronavirus tweets was far too great for one to find accurate or reliable information. Healthcare workers were flooded with which drowned the of valuable COVID-19 information. To combat the infodemic, physicians created healthcare-specific micro-communities to share scientific information with other providers. We analyzed the content of eight physician-created communities and categorized each message in one of five domains. We coded 1) an application programming interface to download tweets and their metadata in JavaScript Object Notation and 2) a reading algorithm using visual basic application in Excel to categorize the content. We superimposed the publication date of each tweet into a timeline of key pandemic events. Finally, we created NephTwitterArchive.com to help healthcare workers find COVID-19-related signal tweets when treating patients. We collected 21071 tweets from the eight hashtags studied. Only 9051 tweets were considered signal: tweets categorized into both a domain and subdomain. There was a trend towards fewer signal tweets as the pandemic progressed, with a daily median of 22% (IQR 0-42%. The most popular subdomain in Prevention was PPE (2448 signal tweets). In Therapeutics, Hydroxychloroquine/chloroquine wwo Azithromycin and Mechanical Ventilation were the most popular subdomains. During the active Infodemic phase (Days 0 to 49), a total of 2021 searches were completed in NephTwitterArchive.com, which was a 26% increase from the same time period before the pandemic was declared (Days -50 to -1). The COVID-19 Infodemic indicates that future endeavors must be undertaken to eliminate noise and elevate signal in all aspects of scientific discourse on Twitter. In the absence of any algorithm-based strategy, healthcare providers will be left with the nearly impossible task of manually finding high-quality tweets from amongst a tidal wave of noise.

11

Social Media Reveals Psychosocial Effects of the COVID-19 Pandemic

Saha, K.; Torous, J.; Caine, E. D.; De Choudhury, M.

2020-10-26 psychiatry and clinical psychology 10.1101/2020.08.07.20170548 medRxiv

Top 0.1%

46.0%

Show abstract

BackgroundThe novel coronavirus disease 2019 (COVID-19) pandemic has caused several disruptions in personal and collective lives worldwide. The uncertainties surrounding the pandemic have also led to multi-faceted mental health concerns, which can be exacerbated with precautionary measures such as social distancing and self-quarantining, as well as societal impacts such as economic downturn and job loss. Despite noting this as a "mental health tsunami," the psychological effects of the COVID-19 crisis remains unexplored at scale. Consequently, public health stakeholders are currently limited in identifying ways to provide timely and tailored support during these circumstances. ObjectiveOur work aims to provide insights regarding peoples psychosocial concerns during the COVID-19 pandemic by leveraging social media data. We aim to study the temporal and linguistic changes in symptomatic mental health and support-seeking expressions in the pandemic context. MethodsWe obtain ~60M Twitter streaming posts originating from the U.S. from March, 24 - May, 25, 2020, and compare these with ~40M posts from a comparable period in 2019 to causally attribute the effect of COVID-19 on peoples social media self-disclosure. Using these datasets, we study peoples self-disclosure on social media in terms of symptomatic mental health concerns and expressions seeking support. We employ transfer learning classifiers that identify the social media language indicative of mental health outcomes (anxiety, depression, stress, and suicidal ideation) and support (emotional and informational support). We then examine the changes in psychosocial expressions over time and language, comparing the 2020 and 2019 datasets. ResultsWe find that all of the examined psychosocial expressions have significantly increased during the COVID-19 crisis - mental health symptomatic expressions have increased by ~14%, and support seeking expressions have increased by ~5%, both thematically related to COVID-19. We also observe a steady decline and eventual plateauing in these expressions during the COVID-19 pandemic, which may have been due to habituation or due to supportive policy measures enacted during this period. Our language analyses highlight that people express concerns that are contextually related to the COVID-19 crisis. ConclusionsWe studied the psychosocial effects of the COVID-19 crisis by using social media data from 2020, finding that peoples mental health symptomatic and support-seeking expressions significantly increased during the COVID-19 period as compared to similar data from 2019. However, this effect gradually lessened over time, suggesting that people adapted to the circumstances and their "new normal". Our linguistic analyses revealed that people expressed mental health concerns regarding personal and professional challenges, healthcare and precautionary measures, and pandemic-related awareness. This work shows the potential to provide insights to mental healthcare and stakeholders and policymakers in planning and implementing measures to mitigate mental health risks amidst the health crisis.

12

Deep learning-based detection of COVID-19 using wearables data

Bogu, G. K.; Snyder, M. P.

2021-01-09 infectious diseases 10.1101/2021.01.08.21249474 medRxiv

Top 0.1%

44.2%

Show abstract

BackgroundCOVID-19 is an infectious disease caused by SARS-CoV-2 that is primarily diagnosed using laboratory tests, which are frequently not administered until after symptom onset. However, SARS-CoV-2 is contagious multiple days before symptom onset and diagnosis, thus enhancing its transmission through the population. MethodsIn this retrospective study, we collected 15 seconds to one-minute heart rate and steps interval data from Fitbit devices during the COVID-19 period (February 2020 until June 2020). Resting heart rate was computed by selecting the heart rate intervals where steps were zero for 12 minutes ahead of an interrogated time point. Data for each participant was divided into train or baseline by taking the days before the non-infectious period and test data by taking the days during the COVID-19 infectious period. Data augmentation was used to increase the size of the training days. Here, we developed a deep learning approach based on a Long Short-Term Memory Networks-based autoencoder, called LAAD, to predict COVID-19 infection by detecting abnormal resting heart rate in test data relative to the users baseline. FindingsWe detected an abnormal resting heart rate during the period of viral infection (7 days before the symptom onset and 21 days after) in 92% (23 out of 25 cases) of patients with laboratory-confirmed COVID-19. In 56% (14) of cases, LAAD detection identified cases in their pre-symptomatic phase whereas 36% (9 cases) were detected after the onset of symptoms with an average precision score of 0{middle dot}91 (SD 0{middle dot}13, 95% CI 0{middle dot}854-0{middle dot}967), a recall score of 0{middle dot}36 (0{middle dot}295, 0{middle dot}232-0{middle dot}487), and a F-beta score of 0{middle dot}79 (0{middle dot}226, 0{middle dot}693-0{middle dot}888). In COVID-19 positive patients, abnormal RHR patterns start 5 days before symptom onset (6{middle dot}9 days in pre-symptomatic cases and 1{middle dot}9 days later in post-symptomatic cases). COVID-19+ patients have longer abnormal resting heart rate periods (89 hours or 3{middle dot}7 days) as compared to healthy individuals (25 hours or 1{middle dot}1 days). InterpretationThese findings show that deep learning neural networks and wearables data are an effective method for the early detection of COVID-19 infection. Additional validation data will help guide the use of this and similar techniques in real-world infection surveillance and isolation policies to reduce transmission and end the pandemic. FundingThis work was supported by NIH grants and gifts from the Flu Lab, as well as departmental funding from the Stanford Genetics department. The Google Cloud Platform costs were covered by Google for Education academic research and COVID-19 grant awards. Research in contextO_ST_ABSEvidence before the studyC_ST_ABSCOVID-19 resulted in up to 1{middle dot}7 million deaths worldwide in 2020. COVID-19 detection using laboratory tests is usually performed after symptom onset. This delay can allow the spread of viral infection and can cause outbreaks. We searched PubMed, Google, and Google Scholar for research articles published in English up to Dec 1, 2020, using common search terms including "COVID-19 and wearables", "Resting heart rate and viral infection", "Resting heart rate and COVID-19", "machine learning and COVID-19" and "deep-learning and COVID-19". Previous studies have attempted to use an elevated resting heart rate as an indicator of viral infection. Although these studies have investigated the early prediction of COVID-19 using resting heart rate and other wearables data, studies to investigate a deep learning-based prediction model with performance evaluation metrics at the user level has not been reported. Added value of this studyIn this study, we created a deep-learning system that used wearables data such as abnormal resting heart rate to predict COVID-19 before the symptom onset. The deep-learning system was created using retrospective time-series datasets collected from 25 COVID-19+ patients, 11 non-COVID-19, and 70 healthy individuals. To our knowledge, this is the first deep-learning model to identify an early viral infection using wearables data at the user level. This study also greatly extends our previous phase-1 study and factors unpredictable behavior and time-series nature of the data, limited data size, and lack of data labels to evaluate performance metrics. The use of a real-time version of this model using more data along with user feedback may help to scale early detection as the number of patients with COVID-19 continues to grow. Implications of all the available evidenceIn the future, wearable devices may provide high-resolution sleep, temperature, saturated oxygen, respiration rate, and electrocardiogram, which could be used to further characterize an individuals baseline and improve the deep-learning model performance for infectious disease detection. Using multi-sensor data with a real-time deep-learning model has the potential to alert individuals of illness prior to symptom onset and may greatly reduce the viral spread.

13

Denoising Longitudinal Social Media for Pandemic Monitoring

Lin, S.; Garay, L.; Hua, Y.; Guo, Z.; Xu, X.; Yang, J.

2024-06-30 public and global health 10.1101/2024.06.29.24309690 medRxiv

Top 0.1%

43.6%

Show abstract

ObjectiveCurrent studies leveraging social media data for disease monitoring face challenges like noisy colloquial language and insufficient tracking of user disease progression in longitudinal data settings. This study aims to develop a pipeline for collecting, cleaning, and analyzing large-scale longitudinal social media data for disease monitoring, with a focus on COVID-19 pandemic. Materials and MethodsThis pipeline initiates by screening COVID-19 cases from tweets spanning February 1, 2020, to April 30, 2022. Longitudinal data is collected for each patient, two months before and three months after self-reporting. Symptoms are extracted using Name Entity Recognition (NER), followed by denoising with a combination of Graph Convolutional Network (GCN) and Bidirectional Encoder Representations from Transformers (BERT) model to retain only User Symptom Mentions (USM). Subsequently, symptoms are mapped to standardized medical concepts using the Unified Medical Language System (UMLS). Finally, this study conducts symptom pattern analysis and visualization to illustrate temporal changes in symptom prevalence and co-occurrence. ResultsThis study identified 191,096 self-reported COVID-19-positive cases from COVID-19-related tweets and retrospectively collected 811,398,280 historical tweets, of which 2,120,964 contained symptoms information. After denoising, 39% (832,287) of symptom-sharing tweets reflected user-related mentions. The trained USM model achieved an F1 score of 0.926. Further analysis revealed a higher prevalence of upper respiratory tract symptoms during the Omicron period compared to the Delta and wild-type periods. Additionally, there was a pronounced co-occurrence of lower respiratory tract and nervous system symptoms in the wild-type strain and Delta variant. ConclusionThis study established a robust framework for pandemic monitoring via social media, integrating denoising of user-related symptom mentions and longitudinal data. The findings underscore the importance of denoising procedures in revealing accurate prevalence trends, thereby minimizing biases in symptom analysis.

14

Context-Aware Digital Phenotyping of Youth Mental Health Using Mobile Ecological Prospective Assessments of Smartphone Use

Patel, J.; Tolulope Ibrahim, S.; Katapally, T. R.

2025-08-26 psychiatry and clinical psychology 10.1101/2025.08.24.25334320 medRxiv

Top 0.1%

43.4%

Show abstract

BackgroundYouth mental disorders affect 12-14% of adolescents globally and remain underdiagnosed and undertreated. Digital phenotyping offers a scalable approach to real-time behavioural monitoring via smartphones, yet most studies rely solely on passive measures such as screen time, overlooking contextual factors. MethodsThis cross-sectional study was part of the Smart Platform, a digital citizen science initiative that engaged youth aged 13-21 years. Participants completed a baseline survey on sociodemographic characteristics and mental health (depression, anxiety, and suicidal ideation). Over the next seven days, context-aware digital phenotyping was conducted, defined as the collection of ecologically valid, time-stamped behavioural data from personal devices. This was implemented through mobile ecological prospective assessments (mEPAs) to capture self-reported smartphone use context, including activity type, location, and social setting. Multivariable logistic regression assessed associations between smartphone use context and mental health, adjusting for sociodemographic covariates. ResultsEighty-four youth completed the baseline survey and at least one mEPA. A higher proportion of smartphone use at home was associated with lower odds of depression (OR=0.105, 95% CI: 0.028-0.276) and anxiety (OR=0.150, 0.053-0.345). A greater proportion of smartphone use while alone was associated with higher odds of depression (OR=3.802, 1.622-11.241), as was a greater proportion of time spent internet surfing (OR=2.663, 1.238-6.843). Longer duration of smartphone use outside the home was associated with higher odds of depression (OR=4.289, 1.443-16.579). ConclusionContext-aware smartphone metrics may offer more informative digital phenotyping indicators of youth mental health than duration alone, supporting integration of multi-context measures into early detection and precision prevention frameworks.

15

Sleep health education and a personalized smartphone application improve sleep and productivity and reduce healthcare utilization among employees: Results of a randomized clinical trial

Robbins, R.; Weaver, M. D.; Quan, S.; Sullivan, J. P.; Qadri, S.; Czeisler, C. A.; Barger, L. K.

2021-10-16 occupational and environmental health 10.1101/2021.10.13.21264974 medRxiv

Top 0.1%

43.2%

Show abstract

Sleep deficiency and undiagnosed or untreated sleep disorders are pervasive among employed adults, yet often ignored in the context of workplace health promotion programs among employers. Smartphone applications (app) are a promising, scalable approach to improving sleep among employees. In this randomized clinical trial, we evaluate the dayzz app, a personalized sleep training program that promotes healthy sleep and sleep disorders awareness through personalized, comprehensive sleep improvement solutions. In a sample of daytime employees affiliated with a large healthcare organization, we evaluated the dayzz app in a parallel-group, randomized, waitlist control trial. Participants were randomly assigned to either use the dayzz app throughout the study or the waitlist control condition where they would receive the dayzz app at the end of the study period. We collected data on employee sleep (e.g., sleep duration, sleep health behavioral changes); workplace outcomes (e.g., employee presenteeism, absenteeism, and performance); and healthcare utilization (e.g., mental health, ambulatory visits, and emergency room visits), throughout the study. Results show that those assigned to the experimental condition exhibited an increase in healthy sleep behaviors; an increase in sleep duration; a trend toward a more regular sleep schedule; and a significant increase in overall sleep quality. Regarding workplace outcomes, results showed that those in the experimental condition also demonstrated a trend toward less absenteeism and significantly lower presenteeism; and those in the experimental condition reported lower healthcare utilization. Results from this randomized clinical trial demonstrate that a workplace sleep wellness program can be beneficial to both the employee and employer.

16

An Analysis of Self-reported Longcovid Symptoms on Twitter

Singh, S. M.; Reddy, C.

2020-08-15 public and global health 10.1101/2020.08.14.20175059 medRxiv

Top 0.1%

42.3%

Show abstract

ObjectivesA majority of patients suffering from acute COVID-19 are expected to recover symptomatically and functionally. However there are reports that some people continue to experience symptoms even beyond the stage of acute infection. This phenomenon has been called longcovid. Study designThis study attempted to analyse symptoms reported by users on twitter self-identifying as longcovid. MethodsThe search was carried out using the twitter public streaming application programming interface using a relevant search term. ResultsWe could identify 89 users with usable data in the tweets posted by them. A majority of users described multiple symptoms the most common of which were fatigue, shortness of breath, pain and brainfog/concentration difficulties. The most common course of symptoms was episodic. ConclusionsGiven the public health importance of this issue, the study suggests that there is a need to better study post acute-COVID symptoms.

17

Public Perception of COVID-19 Vaccines on Twitter in the United States

Xie, Z.; Wang, X.; Jiang, Y.; Chen, Y.; Huang, S.; Ma, H.; Anand, A.; Li, D.

2021-10-18 infectious diseases 10.1101/2021.10.16.21265097 medRxiv

Top 0.1%

42.2%

Show abstract

BackgroundCOVID-19 vaccines play a vital role in combating the COVID-19 pandemic. Social media provides a rich data source to study public perception of COVID-19 vaccines. ObjectiveIn this study, we aimed to examine public perception and discussion of COVID-19 vaccines on Twitter in the US, as well as geographic and demographic characteristics of Twitter users who discussed about COVID-19 vaccines. MethodsThrough Twitter streaming Application Programming Interface (API), COVID-19-related tweets were collected from March 5th, 2020 to January 25th, 2021 using relevant keywords (such as "corona", "covid19", and "covid"). Based on geolocation information provided in tweets and vaccine-related keywords (such as "vaccine" and "vaccination"), we identified COVID-19 vaccine-related tweets from the US. Topic modeling and sentiment analysis were performed to examine public perception and discussion of COVID-19 vaccines. Demographic inference using computer vision algorithm (DeepFace) was performed to infer the demographic characteristics (age, gender and race/ethnicity) of Twitter users who tweeted about COVID-19 vaccines. ResultsOur longitudinal analysis showed that the discussion of COVID-19 vaccines on Twitter in the US reached a peak at the end of 2020. Average sentiment score for COVID-19 vaccine-related tweets remained relatively stable during our study period except for two big peaks, the positive peak corresponds to the optimism about the development of COVID-19 vaccines and the negative peak corresponds to worrying about the availability of COVID-19 vaccines. COVID-19 vaccine-related tweets from east coast states showed relatively high sentiment score. Twitter users from east, west and southern states of the US, as well as male users and users in age group 30-49 years, were more likely to discuss about COVID-19 vaccines on Twitter. ConclusionsPublic discussion and perception of COVID-19 vaccines on Twitter were influenced by the vaccine development and the pandemic, which varied depending on the geographics and demographics of Twitter users.

18

Development, System Design, Safety, and Performance Metrics of a Conversational Agent for Reducing Depressive and Anxious Symptoms Based on a Large Language Model: The MHAI Study

Villarreal-Zegarra, D.; Paredes-Gonzales, Y.; Damaso-Roman, A.; Quinones-Inga, J.; Centeno-Terrazas, G.; Lozada, Y. P. A.-M.

2025-09-24 psychiatry and clinical psychology 10.1101/2025.09.22.25336411 medRxiv

Top 0.1%

41.8%

Show abstract

BackgroundConversational agents based on large language models (LLMs) have shown moderate efficacy in reducing depressive and anxiety symptoms. However, most existing evaluations lack methodological transparency, rely on closed-source models, and show limited standardization in performance and safety assessment. ObjectiveWe have two study objectives: (1) to develop an LLM-based conversational agent through system design analysis and initial functionality testing, and (2) to evaluate its safety and performance through standardized assessment in controlled simulated interactions focused on depression and anxiety of two LLMs (GPT-4o and Llama 3.1-8B). MethodsWe conducted a cross-sectional study in two phases. First, we developed a mental health platform integrating a conversational agent with functionalities including personalized context, pretrained therapeutic modules, self-assessment tools, and an emergency alert system. Second, we evaluated the agents responses in simulated interactions based on predefined user personas for each LLM. Four expert raters assessed 816 interaction pairs using a 5-criterion Likert scale evaluating tone, clarity, domain accuracy (correctness), robustness, completeness, boundaries, target language, and safety. In addition, we use performance metrics based on numerical criteria such as cost, response length, and number of tokens. Multiple linear regression models were used to compare LLM performance and assess metric interrelations. ResultsFirst, we developed a web-based mental health platform using a user-centered design, structured into frontend, backend, and database layers. The system integrates therapeutic chat (GPT-4o and Llama 3.1-8B), psychological assessments (PHQ-9, GAD-7), CBT-based tasks, and an emergency alert system. The platform supports secure user authentication, data encryption, multilingual access, and session tracking. Second, GPT-4o outperformed Llama 3.1-8B in both performance metrics based on numerical criteria and Likert scale criteria, generating longer and more lexically diverse responses, using more tokens, and scoring higher in clarity, robustness, completeness, boundaries, and target language. However, it incurred higher costs, with no significant differences in tone, accuracy, or safety. ConclusionOur study presents a conversational agent with multiple functionalities and shows that GPT-4o outperforms Llama 3.1-8B in performance, although at a higher cost. This platform could be used in future clinical trials or real-world implementation studies.

19

User Perceptions of Individually-Tailored Health Information in 1 Digital Apps: Development of a Scale

Ownby, R. L.; Davenport, R.; Caballero, J.

2025-10-27 psychiatry and clinical psychology 10.1101/2025.10.25.25338794 medRxiv

Top 0.1%

41.1%

Show abstract

ObjectivesIndividually tailored health information is thought to have greater effects on patient behavior than generic advice because it is more personally relevant. Most digital health studies, however, do not actually measure the effect of tailoring on study outcomes. To address this gap, we created the Success in Tailoring (SIT) scale which assesses how users perceive information as relevant, useful, and actionable. MethodsThe SIT items were chosen to reflect theoretical work on relevance and elements of the Elaboration Likelihood Model. It was administered to participants in a study of a mobile app providing tailored information about chronic disease self-management to persons 40 years of age and older with low health literacy. Participants responded immediately after completing the study intervention and again three months later. Psychometric analyses focused on the measures reliability, factor structure, and convergent and divergent validity other measures thought to be related and unrelated to it. We assessed test-retest reliability and factorial invariance over administration, and whether the measure predicted changes in key study outcomes. ResultsAnalyses were based on responses from 275 participants. The SITs internal consistency was good, and test-retest reliability was acceptable. Exploratory factor analysis suggested a single-factor solution, although subsequent confirmatory analyses revealed that a bifactor solution with a robust general factor and two minor subfactors fit the data best. The scale was significantly correlated with measures related to its underlying concept and unrelated to measures not related to it, such as physical and cognitive status. Configural and metric, but not scalar, factorial invariance, were confirmed. SIT scores were related to change in activation and disease-management self-efficacy over the course of the study. Confirmatory bifactor analyses supported treating the SIT as essentially unidimensional, with a single total score providing a reliable and valid index of perceived tailoring. ConclusionThe SIT gives researchers a straightforward means of capturing whether participants feel information is successfully tailored to them. It may be helpful in explaining how personalization, a key feature of many digital health apps, may be related to outcomes.

20

Tracking Inflammation in Real Time Following Vaccination: Validation of a Novel Individualized Digital Inflammatory Biomarker Relative to Serum Biomarkers

Dave, D.; Heumann, R.; Wegerich, S.; Sekaric, J.; Oostendorp, J.; Paris, R.; Ward, M. P.; Steinhubl, S.

2025-10-17 primary care research Community evaluation 10.1101/2025.10.13.25337893 medRxiv

Top 0.1%

41.0%

Show abstract

BackgroundInflammatory changes underly many diseases and therapeutic interventions, making accurate tracking of inflammation critical for clinical evaluation of disease course and therapy response. Traditional methods like fever detection and serum biomarkers are limited by imprecision and invasiveness. Collection of self-reported symptoms after vaccination is a common vaccine trial endpoint, but prone to bias. Wearable sensors offer a promising alternative by detecting subtle physiological changes over time. Prior studies show they identify transient post-vaccine inflammation but lack validation relative to serum biomarkers. MethodsThis study included 61 volunteers who were administered one of four mRNA vaccines (1 or 2 doses) for a total of 80 doses. Participants wore a torso sensor patch for 14 days beginning seven days before vaccination, whose data was used to derive an individualized digital biomarker of inflammation - inflammatory multivariate change index (iMCI). The reference outcome was a serum biomarker panel collected at baseline and five post-vaccination timepoints. Self-reported reactogenicity symptoms were tracked daily for 7 days starting the day of vaccination. The correlation between total iMCI response within 48 hours following vaccination and maximal change in select serum biomarkers was determined, along with their relationship to reactogenicity. FindingsThere was a moderate to strong positive Spearman correlation between total iMCI and change in C-reactive protein (CRP) (0.59, p < 0.01) and interferon gamma (IFN-Y)(0.56, p < 0.01) across vaccine types and vaccine doses, similar to the correlation between CRP and IFN-Y (0.60, p < 0.01). The associations with self-reported systemic reactogenicity was only moderate for all: 0.48, p < 0.01 for iMCI, 0.34, p = 0.01 for interferon gamma, 0.36, p <. 0.01 for C-reactive protein. InterpretationThe personalized multivariate inflammatory digital biomarker derived from wearable sensor data can quantify an individuals inflammatory response to vaccination as an alternative to serial serum biomarker testing. This scalable, non-invasive approach can enable real-time monitoring of the onset, duration, and severity of inflammation. FundingModerna, Inc